The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
نویسندگان
چکیده
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem.
منابع مشابه
Stability and Convergence of Stochastic Approximation using the O.D.E. Method - Decision and Control, 1998. Proceedings of the 37th IEEE Conference on
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated 0.d.e. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) ...
متن کاملLearning Algorithms for Risk-Sensitive Control
This is a survey of some reinforcement learning algorithms for risk-sensitive control on infinite horizon. Basics of the risk-sensitive control problem are recalled, notably the corresponding dynamic programming equation and the value and policy iteration methods for its solution. Basics of stochastic approximation algorithms are also sketched, in particular the ‘o.d.e.’ approach for its stabil...
متن کاملLids - P - 2172 Asynchronous Stochastic Approximation and Q - Learning 1
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
متن کاملAPPROXIMATION OF STOCHASTIC PARABOLIC DIFFERENTIAL EQUATIONS WITH TWO DIFFERENT FINITE DIFFERENCE SCHEMES
We focus on the use of two stable and accurate explicit finite difference schemes in order to approximate the solution of stochastic partial differential equations of It¨o type, in particular, parabolic equations. The main properties of these deterministic difference methods, i.e., convergence, consistency, and stability, are separately developed for the stochastic cases.
متن کاملApproximation of stochastic advection diffusion equations with finite difference scheme
In this paper, a high-order and conditionally stable stochastic difference scheme is proposed for the numerical solution of $rm Ithat{o}$ stochastic advection diffusion equation with one dimensional white noise process. We applied a finite difference approximation of fourth-order for discretizing space spatial derivative of this equation. The main properties of deterministic difference schemes,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Control and Optimization
دوره 38 شماره
صفحات -
تاریخ انتشار 2000